Gracefully skip parsing errors when source files use unsupported encoding#754
Conversation
uhafner
left a comment
There was a problem hiding this comment.
Thanks for the fix! I'm not sure if it skips the characters or if renders them correctly now? Can you add an assertion that makes it clear what is the new output?
| } | ||
|
|
||
| private List<String> readSourceLines(final Path sourcePath, final Charset charset) throws IOException { | ||
| try (var reader = new java.io.BufferedReader(new InputStreamReader(Files.newInputStream(sourcePath), |
There was a problem hiding this comment.
| try (var reader = new java.io.BufferedReader(new InputStreamReader(Files.newInputStream(sourcePath), | |
| try (var reader = new BufferedReader(new InputStreamReader(Files.newInputStream(sourcePath), |
| Path sourceFile = workspace.resolve("Example.m"); | ||
| Files.write(sourceFile, List.of( | ||
| "function y = example()", | ||
| "% Copyright 2026, Caf\u00e9 Corporation", |
There was a problem hiding this comment.
| "% Copyright 2026, Caf\u00e9 Corporation", | |
| "% Copyright 2026, Café Corporation", |
uhafner
left a comment
There was a problem hiding this comment.
This is way too complex. Please simplify.
| byte[] paintedBytes = readPaintedBytesFromNestedZip(outerZipPath); | ||
|
|
||
| assertThat(containsBytes(paintedBytes, CAFE_NBSP_CORPORATION_UTF8)) | ||
| .as("Painted HTML must contain UTF-8 bytes for 'Cafe-acute NBSP Corporation' " | ||
| + "(43 61 66 C3 A9 C2 A0 43 6F 72 70 6F 72 61 74 69 6F 6E). " | ||
| + "Actual painted bytes: " + toHex(paintedBytes)) | ||
| .isTrue(); | ||
|
|
||
| assertThat(containsBytes(paintedBytes, REPLACEMENT_CHAR_UTF8)) | ||
| .as("Painted HTML must NOT contain UTF-8 replacement character EF BF BD " | ||
| + "— that would mean 0xE9 was not decoded correctly as windows-1252") | ||
| .isFalse(); |
There was a problem hiding this comment.
Simplify that code with (and remove the byte handling):
var renderedText = new String(paintedBytes, StandardCharsets.UTF_8).replace("\u00A0", " ");
assertThat(renderedText).contains("Copyright 2026, Café Corporation");
…orrectly in painted source files
☀️ Quality MonitorTests Coverage for New Code 〰️ Line Coverage: 100.00% Coverage for Whole Project 〰️ Line Coverage: 76.88% Style Bugs Vulnerabilities 🛡️ OWASP Dependency Check: 201 vulnerabilities Software Metrics 🌀 Cyclomatic Complexity: 1032 (total) 📌 Reference ResultsDelta reports computed against the reference results of c0a47c7 in workflow run 26680624830. 🚦 Quality GatesOverall Status: ✅ SUCCESS✅ Passed Gates
Created by Quality Monitor v4.14.3 (#a8d815d). More details are shown in the GitHub Checks Result. |
|
Thanks! |
Coverage paint fails for files containing extended ascii characters
Fixes #678
Testing done
Submitter checklist